Similarity Measures for Nominal Variable Clustering

نویسنده

  • Zdeněk Šulc
چکیده

The paper deals with selected similarity measures which can be used for hierarchical clustering of nominal variables. These variables are commonly used in questionnaire surveys. Cluster analysis can be applied in case a reduction of a dataset size is welcomed. In this paper, there are examined several similarity measures for nominal variable clustering, which have been introduced in recent years. On the contrary to the simple matching coefficient, which is considered to be a basic similarity measure, they take into account more characteristics regarding the dataset, such as distribution of frequencies of categories. Therefore, they should provide better results in a comparison to the simple matching coefficient. The performance of clustering with selected similarity measures is examined on two real datasets. For cluster quality evaluation, indices based on the within-cluster variability have been chosen. All computations have been performed in the statistical systems Matlab, IBM SPSS Statistics and MS Excel.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

New distance and similarity measures for hesitant fuzzy soft sets

The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...

متن کامل

Taxonomy of Nominal Type Histogram Distance Measures

Abstract: Distance or similarity measures are of fundamental importance to pattern classification, clustering, and information retrieval problems. Various distance/similarity measures that are applicable to compare two nominal type histograms are reviewed and categorized in both syntactic and semantic relationships. A correlation coefficient and a hierarchical clustering technique are adopted t...

متن کامل

Cluster Analysis of Economic Data

In the paper, some classical and recent approaches to cluster analysis are discussed. Over the last decades researchers focused mainly on categorical data clustering, uncertainty in cluster analysis and clustering large data sets. In this paper some of the recently proposed techniques are introduced, such as similarity measures for data files with nominal variables, algorithms which include unc...

متن کامل

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014